en
AI Ranking
每月不到10元,就可以无限制地访问最好的AIbase。立即成为会员
Home
News
Daily Brief
Income Guide
Tutorial
Tools Directory
Product Library
en
AI Ranking
Search AI Products and News
Explore worldwide AI information, discover new AI opportunities
AI News
AI Tools
AI Cases
AI Tutorial
Type :
AI News
AI Tools
AI Cases
AI Tutorial
2024-08-15 14:53:25
.
AIbase
.
11.1k
OpenAI Launches SWE-bench Verified: Enhancing AI Software Engineering Capability Assessment
OpenAI has released SWE-bench Verified, aiming to more accurately assess AI performance in software engineering tasks and address the limitations of the original SWE-bench, such as overly strict unit tests, ambiguous problem descriptions, and challenging development environment setups. The new benchmark improves assessment consistency and reliability by introducing a containerized Docker environment, significantly enhancing the performance scoring of AI models. GPT-4o solved 33.2% of the samples under the new benchmark, while the best open-source agent framework has...